So this week I'm visiting family and participating in a game jam. Go me. Here's the dev journal/entry page.
Would anyone be willing to help me a little with my homework in Python? I'm taking a course in Python for biologists, which mainly applies to genetics. I'm not a geneticist, I'm in this course because I like suffering, I guess. So guess what I'm doing? Suffering. The first two assignments for the course were pretty easy once I figured out a couple of things, but this current one is kicking my ass. I'm pretty certain it's going to get turned in late anyway, but I'm going to keep trying for now. The first part is to read a text file with 45 rows, each with a restriction enzyme name and a cut site separated by a tab, and split the information into a list of enzymes and a list of cut sites. So far, I've gotten it to give me 45 lists with 2 elements in each, when I need 2 lists with 45 elements each (and, assumably, those elements associated with each other by their position in the lists). Later I'm going to use for loops to count how many times each enzyme shows up in a given sequence, and also find the position of the first nucleotide of the cutsite. I want to say I could figure that part out, but I can't even get to it until I get the first part done, well, first.
If I'm interpreting the thing correctly you mean something like Enzymes = [] Cutsites = [] For line in file: --E = line.readword()* --C = line.readword() --Enzymes.append(e) --Cutsites.append(c) ? *however reading works idk E: -- being spaces
it sounds like you actually want a two-dimensional array. (aka, an array in an array, because what is sanity?) does this help? Or maybe this? (Dunno what version of Python you're using) (Also I do not know python beyond the super basics, but I know some c# and I use a ton of Powershell at work)
@iff I feel a little silly but what is readword for there? A placeholder for a method? @Boots The thing is that we haven't covered arrays, and the requirements of the homework don't say anything about them, so I don't think I'm supposed to need to use them? I'm still fiddling around with this, I'll let you know if I come up with anything.
Yes, whatever method returns the restriction enzyme name and the cut site It may look more like E, C = line.split('\t') ? (I think there aren't arrays in Python but lists are similar so like two-dimensional lists)
@iff What you said was closest, I was able to ask about it in class today. It ended up being: enzymes = [] cutSites = [] for line in file: (Name, Seq) = line.split("\t") Seq = Seq.replace("\n","") enzymes.append(Name) cutSites.append(Seq) With (Name, Seq) = line.split("\t") being the part that was completely eluding me. @~@ I didn't know you could just... put it into more than one thing like that.
Ooooooh ok for some reason I thought the problem was the appending part xd You can only do it in some languages! (I'm... pretty sure?) Otherwise you could do it like X = line.split("\t") Name = X[0] Seq = X[1]
That's what I initially tried to do, and I'm actually wondering now if that would have worked...? Maybe. I'll let y'all know if I ever end up doing anything fun with this class! Today's assignment (taking a DNA sequence and using a function to generate a reverse complementary sequence) was a jillion times easier, thankfully. I finished it in like 15 minutes.
SO, I'd actually be super interested in other assignments that you have? Like...I kind of want to try to figure them out, because I learn programming best when I have a project, and even though I don't really know anything about biology, I imagine the programming concepts would be pretty universal, and now is as bad a time as any to pick up some more python :P (also I sincerely apologise for that travesty of a run-on sentence.)
@Boots I would be fine with letting you know, although I would warn that this particular course assumes you know a good deal of biology (or have excellent recall of basic biology). Annoyingly for me but perhaps fortunate for you, after I bought the book we're using, I found that it's pretty much entirely online. That links to the fourth chapter on lists and loops. It has a lot of biology based examples, too. If it seems like something fun for you, I would be happy to send you the files necessary for our homework assignments for you to play around with. :) They're mostly DNA sequences.
That actually does sound interesting! I have a vague recollection of basic biology, but I'm pretty good at googling stuff, so if I need to understand the biology part to get the coding part, I should be good. Thanks! :D (Also, don't you hate that? Teacher says you need a book, says you can get it at the bookstore for $324589732645.00, and then you find comparable information online/you never touch the book at all because it's actually for "supplemental reading")
Cool! I'll PM you to chat about the best way to send files. :) (Normally, yes, but I don't think the prof actually knew this existed, and I found it on accident. AFAIK, this is the book verbatim. All in all, I'm pleased there's text I can search now instead of looking through book pages and getting all @~@ )
Hey, professional Pythonista, I know this is days late, but I am always down for helping teach Python. PM and I can give Skype and or Email.
@ADigitalMagician I actually have your skype somewhere? Someone mentioned maybe asking you for help but I felt weird asking for help out of the blue. But I will keep this in mind, thank you! I will identify myself if I message you.
So.. I can explain what a two-dimensional (or even n-dimensional, yeah you can do that too) array is based on, but it requires you to know what a reference type is.
Reference types are objects in memory, right? As opposed to pointers which refer to a location in memory? As an analogy, a reference type would be a single book in the library, while a pointer would be the shelf on which the book is stored? (also, I am interested in your explanation, even though it might make a nice whistling sound as it flies over my head :P )
SfdsiLFs? I thought pointers were reference types. *looks it up* Ohhh okay so it does depend a bit on the language you're working with but it's more like a reference is a specific IBSN and a pointer is a container that holds any ISBN, but the thing I was trying to get at is that you can have a type that does nothing but refer to something else. Which you clearly do get. Also just a warning I'm going to be talking about C because that's the language I'm most familiar with. I'd assume Python does similar things in memory because Python was derived from C++, which was derived from C. So I think of two-dimensional arrays as an array of pointers. It's like you have an array, and that array is nothing more than a bunch of references to other areas of memory. You can do this with other objects, but you can also do this with arrays because arrays are just an area of memory. And you can imitate a two-dimensional array by declaring an array of pointers and filling it with pointers to arrays, like so: To me this makes it very easy to wrap your head around how you can get to n-dimensional arrays; each dimension is just another nesting. The next part is a little easier to understand if you understand that array types in C are basically pointers to the first element of the array. The only data C retains about the array once it's declared is the type it was declared as and the pointer. The type it was declared as is used to measure the step it needs to take to access the next element of the array. So when you declare an array of type int with a size of 5, and then ask for element 3 of that array, the computer takes that pointer to the first element, adds 3 times the size of the array's type, in this case an integer, to the pointer, and dereferences that pointer. If your array actually doesn't have 4 elements, that is was declared as having a size of 3 or less, C doesn't care, it doesn't retain that information, it dereferences the pointer and accesses whatever's 3 times the size of an integer from the first address in the array, which could be anything from a bunch of stuff that's allocated elsewhere in your program to the next instructions in your program to a vital part of memory that allows your computer to boot. (This particular error is called a buffer overflow just btw.) Got it? This is a little complicated. Now, an array of references are not actually how multi-dimensional arrays are declared in C. How the compiler allocates the memory for a multi-dimensional array is it multiplies all the dimensions of the array together, allocates that much memory as one big chunk, and treats the different array subscripts as references to different areas of the huge array. So if you declare a 2-dimensional array that is 4x5, it will allocate 20 slots and say the first 5 belong to array[0], the next 5 belong to array[1] and so on. In the same way that C uses a pointer to a one-dimensional array to access the elements by multiplying the array subscript by the size of the type contained, C takes the first array subscript in a two-dimensional array, multiplies it by the size of the entire array, and adds it to the pointer it has to element [0][0]. Then it goes on to do the same with the second array subscript, and finally dereferences that pointer. So multi-dimensional arrays are essentially just packaging on top of the same basic type. Which is definitely useful, since that's basically also why we use higher-level languages instead of just programming in Assembly or binary. But at its core it's all just pointer arithmetic. Did that make any sense at all?
Actually, yeah! The pictures helped, because I do a lot of Powershell (which is written in C#, which makes it like the step-great-grandchild of C?) and your pictures are pretty much exactly how I visualize arrays. But I never really knew how they worked on the back-end, I just knew how to make them do the thing, if that makes sense. I'm probably going to bookmark this and add it to my ever growing pile of "Things that help translate Boots' brain pictures into English". Thanks!